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1  Introduction 

Mucn  of  the  career  of  Herbert  Solomon  has  been  devoted  to  combining  the 
identification  of  important  applications,  the  building  of  models  for  these  applications 
and  the  use  of  data  to  estimate  the  parameters  of  the  models  or  to  develop  new 
models.  His  activities  have  covered  the  vast  territory  of  statistics,  operations 
research  and  the  social  sciences.  He  has  considered  problems  arising  in  such  diverse 
areas  as  logistics,  inventories  and  queueing,  quality  control  and  inspection,  learning 
and  human  factors,  packing  and  geometrical  probability.  It  is  in  this  eclectic  spirit 
that  we  consider  a  problem  area  involving  both  probability  modelling  and  statistical 
inference. 

This  paper  presents  a  natural  generalization  and  a  statistical  analysis  for  a  general 
and  important  class  of  stochastic  processes,  simple  Markov  population  processes 
(SMPP).  This  class  of  processes  is  broad  enough  to  encompass  simple  queues  and 
complex  queueing  networks,  repair  models,  manpower  models,  labor  mobility  and 
migration  phenomena,  and  many  others.  The  equilibrium  theory  for  the  SMPP  is  well 
known.  The  reader  should  consult  Kingman  (1969)  or  Kelly  (1979)  for  a  thorough 
treatment  of  the  probabilistic  aspects  of  this  tneory.  We  will  focus  on  statistical 
inferences  associated  with  random  parameter  versions  of  the  SMPP. 

Traditional  formulations  of  applied  probability  models  typically  emphasize  only  one 
of  the  many  possible  sources  of  random  fluctuations  that  may  influence  system 
benavior.  For  example,  extensive  treatment  has  been  given  to  a  wide  variety  of 
queueing  modeis  for  congestion  at  service  facilities,  in  these  models,  some  simple 
arrival  or  demand  process  confronts  a  given  service  process  often  presumed  to  nave 
u.d.  random  variaoie,  or  possibly  Markovian,  character.  The  parameters  of  the 
component  arrival  and  service  processes  are  nearly  always  assumed  to  be  few  in 
number,  given  and  fixed.  Consequently,  the  predicted  waiting  times  and  queue 
tengtns  vary  omv  m  response  to  the  ,nnerent  variability  of  the  arrival  and  service 
processes  if  tne  parameters  of  the  latter  are  also  fixed  and  known.  It  may  be 
argued  that  sucn  internal  or  witnm  variability  is  not  tne  omv  source  to  be  considered 
under  many  circumstances:  the  fixed  rates  may  be  expected  to  change  occasionally, 
and  additionally  may  well  be  unknown.  Similarly,  inventory  control  models  typically 
assume  tnat  parameters  of  demand  distributions  a-e  fixed,  as  do  reliability- 
redundancy  studies  of  fauure-prone  repairable  systems,  and  the  compartment  models 
of  pharmacology. 

Perhaps  for  reasons  of  mathematical  tractaoiuty.  tnere  has  been  far  less  attention 
paid  to  models  having  time-dependent  parameters,  where  the  time  dependence  is 
either  deterministic  or  'random.'  Recently,  nowever,  models  representing  random 
environments  or  douDi.  stocnastic  effects  nave  appeared  in  the  literature.  Such 
models  should  be  useful  m  system  studies  for  descnu  ng  realistic  situations  in  which 
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parameter  values  vary  widely  because  of  weather  or  other  natural  environmental 
influences  or  because  of  the  impact  of  client,  patient,  or  opponent  action,  or  various 
other  exogenous  influences.  Thus,  the  traditional  models  should  be  extended  by 
incorporating  external  or  between  variability  components.  Because  it  is  often  natural 
to  suppose  first  that  the  basic  parameters  of  the  model  of  interest  are  drawn  from  a 
given  (with  unknown  parameters)  superpopulation,  we  refer  to  such  models  as 
hierarchical. 

The  objective  of  this  paper  is  to  formulate  and  solve  problems  of  statistical 
inference  for  hierarchical  models  which  arise  in  the  context  of  Markov  population 
processes.  We  will  assume  that  we  are  given  an  observation  of  the  sample  path 
from  a  set  of  Markov  population  processes.  Each  of  the  processes  is  governed  by 
its  own  set  of  parameters.  The  vectors  of  parameters  for  each  of  the  processes  are 
assumed  to  be  drawn  i.i.d.  from  a  parent  "superpopulation"  distribution  which  may 
itself  have  unknown  parameters.  The  goal  is  to  estimate  the  parameters  of  the 
superpopulation,  because  these  describe  the  population.  Using  this  information,  we 
wish  to  estimate  the  parameters  of  each  of  the  Markov  population  processes  to 
make  inferences  about  a  particular  instance  of  the  system,  population  or 
compartment.  if  positive  indications  are  present,  it  will  be  possible  to  pool 
information  from  other  processes  to  improve  the  estimates  of  one  in  particular,  i.e., 
to  "borrow  strength"  in  John  Tukey's  words.  On  the  other  hand,  we  seek  approaches 
which  are  "discrepancy-tolerant"  or  "robust,"  i.e.,  which  do  not  unjustifiably  pool  data 
when  counter-indications  are  presented  by  the  data;  Gaver  (1985). 

The  general  inference  problem  described  seems  to  fall  into  the  category  of  a 
standard  Bayesian  analysis  or,  more  ambitiously,  an  empirical  Bayes  or  Bayes- 
empirical  Bayes  analysis.  The  reader  might  consult  Morris  (1983)  for  a  review  of 
parametric  empirical  Bayes  methods,  Robbins  (1983)  and  in  much  previous  work,  has 
elucidated  the  non-parametric  Bayes  approach,  or  Deely  and  Lindley  (1981). 
Surprisingly,  scant  attention  seems  to  have  been  paid  to  empirical  Bayes  methods  for 
stochastic  processes. 

This  paper  is  organized  as  follows.  We  introduce  the  notion  of  Markov  population 
processes  in  Section  2.  Section  3  gives  several  examples  of  situations  in  which 
random  parameter  versions  of  Markov  population  processes  are  important.  Section  4 
develops  a  likelihood  function  and  Bayesian  statistical  inference  for  these  processes. 
Section  5  briefly  discusses  empirical  Bayes  approaches.  Section  6  points  out 
important  topics  for  future  research,  mucn  of  which  is  currently  in  progress. 
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2  Markov  Population  Processes 

We  are  concerned  with  Markov  population  processes  in  n  dimensions.  Such  a 
process  {£(t),t  }  0}  is  a  continuous  time  Markov  process  with  state  space  S  = 

{(ivi2 in).  ij«Z+,1  i  j  i  n).  The  components  of  the  state  space  correspond  to  the 

occupancy  of  each  of  the  n  populations  or  colonies.  The  possible  transitions  are 

defined  as  follows.  Let  T  (5)  =  Tij(s|,...,sn)  =  (s|...,s|_1,s|-1.si+1,...,sJ+1 sn).  The 

transition  from  state  s  to  T^g  corresponds  to  a  migration  of  an  individual  or  unit  in 
population  (colony)  i  to  population  j  .  For  a  simple  Markov  population  process  the 
flow  rate  for  such  transitions  can  depend  on  §  only  through  s(  .  We  specialize  this 
requirement  further  by  requiring  the  flow  rate  to  have  the  form  X ,f (J(s()  ,  1  i  i.j  i  n, 

i  *  j  • 

We  also  allow  transitions  from  the  outside  into  one  of  the  colonies.  Let  Toj(g) 
correspond  to  an  arrival  from  the  outside  to  colony  j.  This  transition  has  flow  rate 
X Qf 0j^s j)  and  depends  on  s  only  through  s.  Finally,  there  can  be  departures  from  a 

colony  to  the  outside.  We  define  T  (s)  =  (s, s -1,...,s J.  The  transition  from  s  to 

T  (s)  has  flow  rate  Af  (s ). 

10  f  I O  l 

The  functions  {f(J(k),  0£i$n,  OjJj^n}  can  often,  but  not  always,  be  thought  of  as 
known  structural  parameters.  If  one  is  studying  a  Markov  population  process  such  as 
a  queueing  network  or  compartmental  model,  these  parameters  are  determined 
directly  from  knowledge  of  the  structure  (e.g.,  the  compartment  connectivity  or  the 
number  of  servers  at  a  node  and  the  flow  of  traffic  among  the  nodes).  The  rate 

parameters  A  *  . An)  are  generally  not  known  and  must  be  estimated  from 

data.  Several  comments  are  in  order.  First,  a  single  unknown  input  parameter  may 
not  be  sufficient,  as  there  could  be  flows  into  different  colonies  whose  relative 
magnitudes  are  not  known.  The  formulation  can  be  easily  extended  to  allow  for  as 
many  as  n  such  parameters  (one  for  each  colony),  but  we  do  not  do  so  here. 
Second,  one  must  introduce  conditions  on  A  and  tne  structural  parameters  jointly  to 
ensure  that  an  equilibrium  distribution  exists.  Indeed,  if  an  equilibrium  distribution 
were  to  exist,  it  would  be  of  product  form;  see  Kelly  (1979).  Fortunately,  the 
existence  of  an  equilibrium  is  unnecessary  for  our  analysis.  There  is  no  need  for  tne 
process  to  be  in  equilibrium  or  to  even  have  an  equilibrium  distribution.  We  merely 
observe  a  part  of  the  sample  path  and  then  develop  estimates  even  if  the  process  is 
transient.  Similarly,  it  is  usual  to  assume  the  state  space  is  irreducible,  but  this  is 

not  necessary  for  our  purposes.  For  this  reason,  we  define  S  =  { <i , . in),  i^Z*,  1  i 

1  i  n}  and  do  not  address  the  possible  boundedness  of  certain  components. 
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3  Example  of  Simple  Markov  Pgpylaiign  Processes 

We  present  several  examples  to  illustrate  a  few  of  the  important  applications  of 
Markov  population  processes.  We  also  describe  the  accompanying  inference 
problems. 

3.1  Simple  Markovian  Queueing  Systems:  M/M/C 

in  this  example,  we  let  n  *  1.  The  state  variable,  x.  corresponds  to  the  number  of 
customers  m  or  awaiting  service.  The  parameter  Xq  corresponds  to  the  arrival  rate, 
while  is  the  service  rate.  The  structural  parameters  are  given  by  fQl(x)  «  1  and 
flQ(x)  *  min  <x.C)  where  C  is  the  number  of  servers.  It  is  likely  that  either  XQ  or  X1 
or  both  are  unknown  and  must  be  estimated  from  data.  Not  only  may  Xq  be 
unknown,  but  it  can  also  exhibit  fluctuations  over  time.  e.g..  from  day  to  day.  This 
is  especially  true  in  service  systems  where  demands  vary.  It  is  reasonable  to  gather 
data  over  relatively  short  periods  of  time  during  which  Xq  and  X(  can  be  considered 
to  be  constant.  By  introducing  a  superpopulation  distribution  over  the  possible 
(X  X ,)  pairs,  one  can  use  these  sample  path  fragments  to  estimate  the 
superpopulation  parameters  and  the  particular  (X^X^  realizations.  Finally,  the 

superpopulation  distribution  can  be  used  to  carry  out  a  complete  system  performance 
analysis. 

3.2  Compartment  Models  in  Pharmacology 

Compartment  models  offer  a  broad  class  of  models  often  used  to  represent  the 
movement  of  drugs  or  pollutants  through  a  system  sued  as  tne  human  body.  The 
compartments  correspond  to  pools  or  tissue  groups  such  as  the  blood  stream  or 
body  organs.  Stocnastic  compartment  models  are  often  equivalent  to  an  open  or 
closed  Jackson  network  of  infinite  server  queues.  The  compartmental  structure  is 

usually  defined  by  an  n  x  n  transition  matrix  P  =  (ptj)  with  p(j  i  0,  jE  p(  i  1  and 

Po  =  ^-P.y  The  structure  functions  are  taken  to  be  f(j<xt)  *  p^x,,  0  $  j  $  n.  If  Xq  =  o 

and  pio=  0,  1  i  i  i  n,  then  the  system  is  closed.  Otherwise,  it  is  open.  We  assume 
P  is  known,  but  tne  analysis  can  be  carried  out  if  we  replace  Xp  x  by  X  x  where 
X  *  X  p  must  be  estimated. 

ij  i  nj 

it  has  been  commented  on  by  Koch-Weser,  as  quoted  by  Wagner  (1975),  that  "drug 
dosages  needed  for  optimal  therapeutic  effects  differ  widely  among  patients.  The 
'usual  dose’  of  most  potent  drugs  accomplishes  little  in  some  persons,  causes 
serious  toxicity  in  others,  and  is  fully  satisfactory  in  a  few."  This  observed 
variability  between  patients  strongly  indicates  that  the  model  rate  parameters  X  s 

(a  ,X1 . Xn)  correspondingly  exhibit  substantial  variability.  We  imagine  that  each 

individual  draws  a  X  from  a  superpopulation  distribution,  and  observe  a  sample  path 
from  each  of  the  compartment  processes.  One  goal  is  to  estimate  the  parameters  of 


'.VjvVrV  '  ■ y  v, V^v.v'-.'y  •- 
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the  superpopulation  distribution,  because  this  determines  the  variation  between 
members  of  the  population.  In  addition,  it  is  important  to  estimate  each  of  the 

individual  X  values,  since  they  may  be  related  to  patient  pathology  or  classification. 
Knowledge  of  the  between  variability  may  be  used  to  strengthen  estimates  of 
individual  X  values. 

3.3  Logistic  Support  for  a  System  Depending  Upon  Repairable  Modules 
Successful  operation  of  each  of  a  set  of  I  vehicle  systems  (e.g.,  trucks,  rental  cars, 
airplanes,  or  ships)  depends  upon  the  operability  of  important  subsystems  or  modules 
(tires,  engines,  communication  and  navigation  subsystems).  Suppose  modules  are 
failure  prone  but  repairable,  and  that  each  module  type  that  is  on  a  vehicle  in 

operation  fails  independently  at  an  (unknown)  Markovian  rate  Xj|#  where  j  refers  to  a 
module  of  type  j,  1£j£J,  and  i  refers  to  the  ith  vehicle.  1£i$l.  Let  the  Markovian 

repair  rate  for  modules  of  type  j  be  p  (unknown),  meaning  that  repair  times  are 

independently  exponentially  distributed.  Suppose  that,  in  addition,  there  are  Mj 
modules  of  type  j  on  each  vehicle,  and  that  the  more  that  are  up  or  operable,  the 
more  effective  is  the  overall  system  operation.  In  addition  there  are  S;  spare 
modules  of  type  j  in  the  system,  and  a  repair  system  that  contains  Rj  service 

facilities  (repair  teams,  equipment;  spare  parts  and  tools  are  considered  separately). 
Let  X^t)  represent  the  number  of  type  j  units  up  and  either  installed  on,  or  awaiting 

installation  on,  all  systems  1£i£l  at  time  t.  Then  under  additional  stipulations 

concerning  the  service  protocol,  (Xj|(t),  t£0)  is  a  njiultivariate  birth  and  death  process. 
The  total  number  ofj  modules  in  the  system  is  Z  (M  +  Sj),  the  total  number  of 

service  facilities  is  I  ^  Rj(  and  the  relevant  problem  is  to  specify  near-optimal  values 
of  S,  and  R  when  data  are  available  to  estimate  the  rates  X  when  u  and  a  suitable 

>i  ~  j 

measure  of  effectiveness  are  specified. 

4  Statistical  Inference  for  Markov  Population  Processes 

We  first  develop  the  likelihood  function  for  simple  Markov  population  processes. 
Later  in  this  section  we  consider  Bayesian  inference. 

4.1  The  Likelihood  Function 

A  Markov  population  process  behaves  in  a  simple  fashion.  Suppose  at  time  0  it  is 
in  state  s.  It  will  remain  in  state  s  for  an  exponential  period  of  time  with  rate 
parameter 

R(s)  =  XI  f  (s  >  -  X  X  I  f  (s  I 

°,=  i  o'  ■  ..  ,  i  po  ij 

n 

where  f  (s)  =  I  f  (s)  and  f  's  ■ 

O  -  1  O'  - 


n 

=  X  f  (s)  *  I  x  f  (s  ) 

O  C  -  |=  -  I  I  i 

=  X  f  is  >. 

I  =  C 
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Of  course,  R(§)  is  the  rate  at  which  the  process  leaves  state  §. 

When  this  exponential  period  ends,  the  process  moves  to  a  new  state.  These 
transitions  are  specified  for  1  i  i  i  n  and  0  £  j  £  n  by 

fQi(§)  with  probability  X 0f 01/R(s>, 

T^ls)  with  probability  X,f  (a )IR(s). 

The  data  from  the  sample  path  can  be  reduced  to  a  sequence  of  states  and  sojourn 
times  in  those  states.  This  can  be  written  as  (s^’.SJ.fs^’.S, ),..., (s<m),  S  )  where  s(t)  is 
the  tth  state  occupied  by  the  process.  To  remove  certain  minor  difficulties,  we  will 
assume  that  s(1)  is  not  random  but  is  deterministic.  It  is  possible  to  assume  that 
the  initial  state  is  stochastic  and  is  chosen  by  some  distribution  (usually  the 
equilibrium  distribution,  if  one  exists).  We  do  not  even  assume  the  existence  of  an 
equilibrium  and  so  simplify  matters  by  taking  sm  to  be  deterministic.  We  have  not 
described  the  sampling  interval  [ 0. T ]  and  prefer  to  leave  it  unspecified.  T  may  be 
deterministic,  in  which  case  the  number  of  transitions  (m-1)  is  stochastic. 
Alternatively,  one  could  observe  the  process  until  m-1  transitions  have  occurred,  in 

m 

which  case  T  =  Z  S,  is  stochastic.  The  likelihood,  however,  would  not  differ 

t=i  1 

substantially. 

Thus,  the  sojourn  times.  S. . contribute  a  factor  of 

1  m 

ri  R(s"')  n  exp(-S,R(s(t>)) 

1  -  ’■  “  t  =  •  1  “ 

to  the  likelihood  function  for  given  X  .  The  state  transitions  also  contribute  to  the 
likelihood  function.  If  the  transition  from  s!k)  to  s<k  +  1>  involves  a  departure  from 
colony  i  to  colony  j,  0£j$n,  a  term  X  f  j(si(k,)/R(sik))  is  multiplied  into  the  likelihood 
function.  A  term  X  f  (s  <k,)/R(s<k))  is  included  if  an  individual  arrives  to  colony  i  from 

the  outside.  We  let  Mo  =  the  total  number  of  arrivals  from  the  outside,  and  Mj  =  the 

total  number  of  departures  from  colony  j  As  a  consequence,  the  likelihood 
function  is  proportional  to 

nn  n  M 

exp<-  I  S ,R<sU)»  n  Xt  k.  (4.1) 

1=1  1  ~  k  =0  k 


The  quantity 


i  S  R<s(,))  *  Z  S.  [i  X  f  <s<»>]  =  I 

:  =  ‘  1  t  =  i  1  .  =  c  ~ 


s,f  (slt,>  =  I  x  w 

=  o  1  1 
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There  are  many  directions  for  future  work,  some  of  which  are  currently  being  taken. 
Models  that  permit  the  between  component  of  variability  to  arise  from  a  multivariate 
stochastic  process,  e.g.,  multivariate  log-Gaussian  process  rates,  are  attractive  when 
endogenous  influences  may  occur  in  a  time-series-like  manner,  as  may  be  true  of 
weather  or  economic  effects.  Hyperparameter  estimation  and  model  diagnosis  again 
present  problems.  Such  problems  promise  to  require  the  computer-intensive  activity 
that  characterizes  much  of  modern  statistics  and  operations  research.  It  is  our  hope 
that  the  results  will,  in  spirit,  resemble  the  various  interdisciplinary  statistical  efforts 
of  Herbert  Solomon,  to  whom  this  paper  is  dedicated. 
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When  a  must  also  be  estimated,  we  must  introduce  a  second  set  of  likelihood 
equations. 


0  i  j  in. 


(5.5) 


Equations  (5.3)  and  (5.5)  must  be  solved  simultaneously  for  a  and  §.  This  problem 
is  the  multivariate  version  of  one  discussed  by  Deely  and  Lindley  (1981).  Indeed  the 
multivariate  problem  separates  into  K  univariate  ones. 

5.2  The  Log-normal  and  Log-student  Superpopulation 

In  order  to  estimate  the  superpopulation  parameters  (£,Z)  or  (£.4)  in  the  normal  and 
student  cases  an  integrated  likelihood  must  be  formed,  analogous  to  that  for  the 
gamma  model.  There  is  no  way  of  avoiding  approximation  or  numerical  integration 
for  trial  parameter  values,  followed  by  a  search  of  some  sort  to  locate  the  maximum 
likelihood  estimates  <£,Z  )  in  the  normal  case,  or  <£,Z  ,k)  in  the  Student  case.  Such  a 
program  has  been  carried  out  and  tested  in  the  univariate  case,  both  on  simulated 
and  observational  data,  see  Gaver(l985).  The  integration  was  conducted  by  taking  a 
preliminary  Laplace  method  (quadratic  approximation  to  the  log-posterior)  approach, 
with  correction  furnished  by  Gauss-Hermite  integration.  Such  a  procedure  appears 
fairly  satisfactory  even  for  the  Student  superpopulation,  although  the  latter 
sometimes  admits  two  posterior  modes.  Sucn  a  program  becomes  far  more 
ambitious  in  the  multivariate  situation  addressed  in  this  paper,  and  further 
approximations  may  well  be  required  in  order  to  reduce  computing.  Of  course,  some 
assessment  of  the  sampling  errors  associated  with  superpopulation  estimates  will 
also  De  desirable.  It  is  likely  that  bootstrapping  will  be  useful,  and  some 
experiments  in  the  univariate  Poisson-log-Student  case  have  already  shown  its 
potential. 


6  Summary  and  Further  Remarks 

We  have  presented  here  an  enhanced  version  of  a  quite  general  familiar  and  useful 
stochastic  model.  The  enhancement  recognizes  between-version  variation  in  process 
parameters;  sucn  may  be  the  result  of  endogenous  influences.  We  have  then 
addressed  the  problem  of  process  parameter  estimation  by  characterizing  the  between 
variability  component  with  the  aid  of  parametric  superpopulations.  In  particular,  it 
has  been  shown  that  the  familiar  linear  shrinkage  effects  often  encountered  in  Bayes 
analyses  using  pnors  of  conjugate  form  are  interestingly  modulated  when  longer- 
tailed,  discrepancv-toierant  priors  or  superpopulations  are  introduced. 


5.1  Conjugate  Gamma  Suparpopulation 

In  the  previous  section,  we  found  that  the  conjugate  gamma  prior  offered  an 
especially  tractable  estimation  solution.  We  did.  however,  assume  the 
hyperparameters  were  known.  Now  we  consider  estimating  them  directly  from  the  K 
sample  paths.  This  is  done  by  finding  the  distribution  of  a  sample  path  conditional 

only  on  the  hyperparameters  a  and  §.  that  is  with  X  «  (XQ . Xn)  integrated  out.  This 

takes  on  a  multivariate  independent  negative  binomial  form. 


a  +w , 

I  I) 


We  now  compute  maximum  likelihood  estimators  of  a  and  §  given  the  data  from 
the  K  sample  paths,  . 

The  log-likelihood  function  for  a  and  §  is  given  by 


where  H(x)  *  log  Hx). 

The  expression  in  <5.2)  must  be  maximized  over  a  and  g. 

Let  us  assume  for  the  moment  that  a  is  treated  as  known,  hence  only  §  must  be 
estimated.  This  is  done  by  solving  the  likelihood  equations. 

K 

0  =  Z  (a. IB  -  U-MWW*)),  0  i  i  i  n.  (5.3) 

k= 1  11  iii' 


K  K. 

We  assume  that  both  Z  Wk  and  Z  Mk  are  positive.  The  case  in  which  both  are  0 

k= 1  1  k=  1 

can  be  handled  separately  in  a  straightforward  fashion.  It  can  arise  only  when  there 
is  no  activity  in  a  particular  colony  in  all  the  K  sample  paths. 

Equation  (5.3)  cannot  be  solved  in  closed  form;  however,  it  does  have  a  unique 
solution  which  can  be  found  numerically.  This  can  be  seen  by  rewriting  (5.3)  for  a 
particular  i  as 

«K  =  Z  (vMfM^/^Wf)).  (5.4) 


Jhe  right  side  is  monotone  increasing  in  B ■  It  is  0  for  «  0  and  increases  to  a(K  - 
Z  Mk  >  A  K  as  fi  approaches  infinity,  hence  there  is  a  unique  root. 

k*  1  ' 
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We  define  Z  =  2k(1+Q)4/(n+1«-k)  and  rewrite  (4.14)  as 

-Z'1^-*)  -  D1  ♦  ftl.  (4.15) 

It  should  be  noted  that  (4.15)  is  identical  to  (4.7)  which  arises  with  the  log-normal 
superpopulation  except  that  in  (4.15)  Z  is  a  function  of  *  through  Q. 

The  second  derivative  of  (4.14)  can  be  written  as 

-  O  -  [l'1  -  I'V^MivPI-’/ta^k)].  W.16) 

It  is  now  possible  for  there  to  be  multiple  solutions  of  the  likelihood  equations 
Q  =  -Z-M*-*)  -  D1  +  M .  (4.17) 

and  there  can  be  more  than  one  local  maximum  in  the  posterior  distribution.  The  use 
of  a  posterior  mean  estimate  becomes  questionable.  It  will  minimize  a  mean  square 
error  criterion  but  will  tend  to  give  an  estimate  between  the  two  if  such  modes 
exist.  An  alternative  approach  is  to  take  a  data-based  initial  estimate  of  t.  *(oj  = 
loglMjMT).  The  *(o1  is  used  to  compute  an  initial  Z,o).  Equation  (4.17)  is  replaced  by 

Q  =  -  D1  *  M  .  (4.18) 

which  is  a  particular  case  of  (4.17)  and  wruch  has  a  unique  solution.  This  solution 
often  gives  useful  results  especially  when  dispersion  between  individual  rates  is 
large;  see  Gaver  (1985).  The  matrix  4  has  been  weighted  by  an  initial  discrepancy 
factor  2k(l*Q)/(n+1+k).  When  this  factor  is  large,  then  the  shrinkage  toward  a  is 
reduced.  This  is  a  desired  effect  for  the  multivariate  log-Student  t  superpopulation 
which  has  heavy  tails.  Values  of  #  far  from  the  mean  are  possible,  consequently  in 
such  cases  the  procedure  refuses  to  "borrow  strength”  when  it  is  unwarrented.  Such 
a  procedure  is  "discrepancy-tolerant"  or  robust. 

6  Empirical  Saves  Methods 

In  this  section,  we  discuss  the  empirical  Bayes  approach  to  these  problems.  We 
consider  the  previous  formulations  but  assume  that  the  superpopulation 
hyperparameters  are  not  known.  They  must  be  estimated  from  all  the  observations. 


mmmms 
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relatively  short-tailed  gamma  and  log-normal  distributions,  even  if  the  parameters  of 
the  latter  are  obtained  from  expert  judgement.  Such  outliers  may  be  the  consequence 
of  recording  errors  or  they  may  be  the  manifestation  of  unsuspected  influences. 

To  obtain  information  about  the  joint  or  marginal  posterior  for  «  and  hence  for  X^ 
*  expUj),  it  is  necessary  to  resort  to  numerical  integration.  It  has  been  found  that 
an  initial  Laplace's  method  approximation,  for  which  one  should  consult  Mosteller  and 
Wallace  (1964)  or  Kadane  and  Tierney  (1984),  followed  by  a  few-point  Gauss-Hermite 
integration,  can  be  quite  effective;  see  Gaver  (1985)  for  a  discussion  of  the  univariate 
simple  Poisson  case.  By  this  approach,  one  can  assess  the  sampling  error  of  the 
point  estimate  achieved  from  the  model  estimate  suggested  above.  One  can  also 
compute  alternative  estimators  such  as  the  perennial  favorite,  the  posterior  mean,  or 
a  weighted  version  thereof. 

4.3.3  Multivariate  Log- Student  t  Prior 

In  order  to  recognize  the  possible  existence  of  rates  of  greater  discrepancy  than 
admitted  by  a  gamma  or  log-normal  superpopulation,  we  introduce  the  multivariate 
log-Student  distribution.  This  superpopulation  has  longer  tails  than  the  normal,  and 
hence  should  yield  interesting  estimates  for  the  rates.  It  can  be  obtained  by  scale 
mixing  the  multivariate  log-normal.  A  formula  for  the  multivariate  t  appropriate  here 
is 


f<i!*,A.k>  *  Cp+1(Det(A)-1/2(UQ(£))(n"'+k,/2 


(4.12) 


where  Q  «  Q(<)  *  &  is  a  covariance  matrix,  k  is  a  degree  of  freedom 

parameter,  and  Cn+ ,  is  a  normalization  constant.  See,  for  example,  Mardia,  et  at 
(1979),  page  57  and  associated  references  therein. 

in  order  to  determine  the  modes  of  the  posterior  distribution  we  proceed  as  before 
by  examining  the  log-posterior  for  X  or  equivalently  for  «.  By  omitting  irrelevant 
constants  we  find 

log  f(r!*,X)  =  -  (n+1+k)log(1+Q)/2  -  XTyv+  »T-M.  (4.13) 


The  first  derivative  of  (4.13)  with  respect  to  »  is  given  by 


I 


-tn-1-k)A'  V-^/ttkO+Q))  -  D1  ♦  . 


(4.14) 
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-l-Hi-tf)  «•  M-  01  (4.7) 

where  0  is  a  diagonal  matrix  with  entries  W^xpif,),  and  1  is  a  column  vector  of 
ones. 

The  second  derivative  of  (4.6)  with  respect  to  <  is  given  by 

-I"1  -  D  (4.8) 

The  matrix  -  (Z~ 1  +  0)  is  clearly  negative  definite,  thus  one  can  find  the  unique 
posterior  mode  by  solving 

0  =  -  *  M-  D1.  (4.9) 

Equation  (4.9)  cannot  be  solved  in  closed  form,  but  a  numerical  solution  may  always 
be  obtained  by  a  Newton-Raphson  iteration  approach.  As  a  starting  solution,  one 
might  use  the  vector  of  logarithms  of  the  raw  rates,  i.e.,  «*o)  «  logOVyw^).  In  case 
small,  occasionally  zero,  values  occur,  one  might  replace  Mj  by  Mj+0.5.  The 
Newton-Raphson  will  take  the  form 

,<^i>  s  £< l )  .  (Z-UD)-’  -  *  -  M*  01)].  (4.10) 

If  the  suggested  starting  value  * ^  =  logOVIj.'Wj)  is  used,  the  first  iteration  leads  to 
£<1)  =  £(o)  .  (Z -1+D)~1Z',(rlo,-ii).  (4.11) 

This  improved  estimate  of  »  is  only  a  first  approximation  to  the  true  solution,  but  it 
has  a  familiar  form  and  intuitive  content,  namely  a  weighted  combination  of  the  raw 
rates  and  the  prior  mean.  As  the  variability  of  the  superpopulation  decreases,  more 
weight  is  put  on  the  superpopulation  mean,  g.  Both  (4.10)  and  (4.11)  exhibit  the 
tendency  for  linear  shrinkage  of  the  raw  estimate  vector  toward  g,  irrespective  of  the 
relation  of  the  log  observed  rates  to  the  superpopulation  center.  Such  linear 
shrinkage  behavior  also  characterizes  estimates  obtained  from  the  conjugate  gamma 
superpopulations.  it  appears  to  be  inappropriate  for  highly  discrepant  observations 
that  may  occur.  Some  observed  rates  may  actually  choose  not  to  conforn  to  tne 
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fU!*) 


no^aix,*r,exp<-/|xj>  xj  i  o.  o  i  j  i  n. 


(4.3) 


The  hyperparameters  are  specified  by  £  *  (g.^fl).  For  a  likelihood  function  given  by 
(4.2),  the  posterior  distribution  of  _<  given  (g(t),  0  i  t  i  T}  is  given  by 


f(X!g.  » 


n 

‘ii 

j** 


flj  +  M 


x(  >  0,  0  *  j  in 


(4.4) 


The  posterior  distribution  has  independent  gamma  marginals.  It  is  simple  to  estimate 
any  of  the  individual  X^  or  some  function  of  them.  The  posterior  mean  of  X  is 
given  by  (a  *^N  ■).  while  the  posterior  mode  is  given  by  (a  +M -WH/I +W jl 

provided  ±  1.  The  posterior  mode  is  seen  to  resemble  the  mean,  and  it  turns 

out  to  be  a  convenient  approximation.  We  assume  that  the  hyperparameters  a  = 

. and  §  =  (ftQ ^n)  are  known,  presumably  by  elicitation  and  relevant 

experience. 

4.3.2  Multivariate  Log-Normal  Prior 

The  previous  choice  of  conjugate  prior  offers  considerable  mathematical  tractability; 
however,  it  does  not  permit  the  X  parameters  to  be  correlated.  Furthermore,  a 
common  choice  of  prior  for  univariate  Poisson  process  data  in  reliability  and 
probabilistic  risk  assessment  studies  is  a  log-normal  distribution,  see  Rasmussen 
(1975).  Hill,  Heger  and  Koen  (1984),  or  Kaplan  (1983).  We  introduce  »  =  logX  and  let 
L  =  <i0r-.rn)T  -  N(B,Z)  where  §  is  an  n+1  dimensional  mean  vector  and  X  is  an  (n+1>  x 
<n*1)  positive  definite  covariance  matrix.  This  formulation  provides  both  log-normal 
marginals  and  the  possibility  of  correlated  components  in  a  familiar  fashion. 

The  log-posterior  distribution  is  obtained  from  the  prior  and  equation  (4.2)  and  is 
given  by 


log  f(X  |  jj.Xi  =log  C  -  (<-^)TZ' ’(*  V'2  -  XT-W«  £TM 


where  C  is  a  normalization  constant  and  W=  (WQ . Wn)T  and  M  =  <M0,...,Mn)T 

It  follows  that  log  f(X|g,  X)  is  given,  up  to  constants,  by 

log  f(X!j»,X)  *  -< jf -ju)T  -  X7W*  (4.6) 

One  can  consider  finding  the  posterior  mode  by  differentiating  (4.6)  with  respect  to 
«.  The  first  derivative  of  (4.6)  with  respect  to  »  is 
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m 

where  W,  =  X  S.f,  (§(,))  . 

t=  i 

The  sufficient  statistics  are  given  by  <(yi  ,  W  )  where  (yi  =  (M0 Mn)T  and  W  = 

<W0,...,Wn)T.  The  likelihood  function  is  of  the  multivariate  independent  Poisson  type: 

exp(-Z  X  W.>n  (4.2) 

j=o  1  ‘  j=o  1 

4.2  Random  Parameter  Processes 

The  likelihood  given  in  equation  (4.2)  is  relevant  to  a  single  version  or  realization  of 
a  simple  Markov  population  process  (SMPP).  In  many  circumstances,  it  is  reasonable 
to  suppose  that  the  rate  parameters  X  =  <  X  o,...,  X  n)T  occasionally  change,  for  example 
in  response  to  external  events.  Assume  we  are  able  to  observe  K  such  different 
versions  and  wish  to  estimate  the  individual  parameters  on  the  supposition  that 
important  between-version  variability  may  exist.  The  simplest  plausible  way  to 
proceed  is  to  introduce  a  superpopulation  of  parameters  X  having  density  f(X|£) 

where  f  denotes  a  hyperparameter  vector.  The  K  observed  sample  paths  are  then 
analyzed  as  if  the  parameters  of  each  version  were  selected  at  random  using  the 
density  f.  Specifically,  let  X(1>,...,XlK>  be  i.i.d.  random  vectors  with  density  f(X,>). 
Here  X(k)  corresponds  to  the  rate  parameters  for  the  k,h  observed  SMPP,  {Xk(t), 

°<t<Tk). 

The  assumption  of  independent  sampling  allows  one  to  simply  construct  an  overall 
likelihood  that  incorporates  the  information  as  well  as  that  in  the  superpopulation 

density  f. 

4.3  Bayesian  Inference 

Simple  Bavesian  inference  assumes  the  superpopulation  density  f  to  be  completely 
specified.  in  the  case  of  a  superpopulation  density  f(X|£),  the  hyperparameters 

£  would  be  treated  as  known,  presumably  by  elicitation.  Estimation  of  X,k)  is 
accomplished  Dy  finding  the  posterior  density  of  X<k)  given  { Xk(t),  0£t<[Tk}  and  then 
computing  some  appropriate  estimator  such  as  the  posterior  mean  vector  or  the 
posterior  mode  vector.  In  this  section,  we  consider  three  particular  parametric  priors. 

4.3.1  Conjugate  Gamma  Prior 

Trie  easiest  situation  occurs  when  we  introduce  a  multivariate  gamma  prior 
distribution  with  independent  marginals.  Specifically,  we  assume 
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